Computational methods for the analysis of early-pregnancy brain ultrasonography: a systematic review

Summary Background Early screening of the brain is becoming routine clinical practice. Currently, this screening is performed by manual measurements and visual analysis, which is time-consuming and prone to errors. Computational methods may support this screening. Hence, the aim of this systematic review is to gain insight into future research directions needed to bring automated early-pregnancy ultrasound analysis of the human brain to clinical practice. Methods We searched PubMed (Medline ALL Ovid), EMBASE, Web of Science Core Collection, Cochrane Central Register of Controlled Trials, and Google Scholar, from inception until June 2022. This study is registered in PROSPERO at CRD42020189888. Studies about computational methods for the analysis of human brain ultrasonography acquired before the 20th week of pregnancy were included. The key reported attributes were: level of automation, learning-based or not, the usage of clinical routine data depicting normal and abnormal brain development, public sharing of program source code and data, and analysis of the confounding factors. Findings Our search identified 2575 studies, of which 55 were included. 76% used an automatic method, 62% a learning-based method, 45% used clinical routine data and in addition, for 13% the data depicted abnormal development. None of the studies shared publicly the program source code and only two studies shared the data. Finally, 35% did not analyse the influence of confounding factors. Interpretation Our review showed an interest in automatic, learning-based methods. To bring these methods to clinical practice we recommend that studies: use routine clinical data depicting both normal and abnormal development, make their dataset and program source code publicly available, and be attentive to the influence of confounding factors. Introduction of automated computational methods for early-pregnancy brain ultrasonography will save valuable time during screening, and ultimately lead to better detection, treatment and prevention of neuro-developmental disorders. Funding The 10.13039/501100003061Erasmus MC Medical Research Advisor Committee (grant number: FB 379283).


Introduction
The rapid development of ultrasound techniques from its introduction in 1956 has led to the implementation of prenatal two-dimensional (2D) ultrasonography in the 1970s. 1,2 2D ultrasonography is used for second trimester congenital anomaly screening worldwide and serves as an important baseline with regards to growth and development. 3 Three-dimensional (3D) ultrasonography for prenatal diagnosis became available in the late 1980s, after the necessary improvement in computer technology and the introduction of transvaginal ultrasound probes. 4 3D ultrasonography has had a major impact on the visualization of the embryo and fetus in the first trimester. Furthermore, 3D ultrasound enables accurate biometric and volumetric measurements of structures that are hard to assess in 2D due to irregular and/or asymmetrical shapes.
During the first trimester of pregnancy, the brain is already clearly visible in ultrasonography, and its growth and structural development continue throughout pregnancy. 5 The DOHAD paradigm (Developmental Origins of Health and Disease) states that there is a strong association between fetal growth and development and health and disease later in life. 6 For prenatal brain development between 9 and 11 weeks gestational age associations were found with maternal age, smoking, mode of conception and folate status. [7][8][9] This highlights the importance of monitoring the development of the early brain, which is reflected in the recommendation of the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) in 2021, to perform a neuro-sonographic examination in the first trimester. 10 Since it provides us with information regarding the etiology and pathophysiology of normal and abnormal development of the human brain.
The ISUOG recommends performing the neurosonographic examination using a 3D trans-vaginal probe. However, when this is not feasible the examination can be performed using a 3D or 2D trans-abdominal probe. 3D ultrasonography is not always feasible due to unavailability of the equipment or lack of a trained sonographer to acquire and/or analyse the image. 10 The recommended examination of the brain during the first trimester consists of measuring the biparietal diameter, head circumference, atrial width of the lateral ventricles and transverse cerebellar diameter. However, as pointed out by Volpe et al., by following this recommendation the majority of brain abnormalities remain undiagnosed until the second trimester. 11 Few studies showed how to best assess the brain during the first trimester using 3D ultrasonography and how major abnormalities are characterized. 11,12 However, monitoring growth and development and screening for abnormalities using 2D or 3D ultrasound scans is time-consuming, prone to human errors and requires specific expertise. Automatic analysis may save time, reduce errors, and allow for taking multiple measurements at the same time. Artificial Intelligence

Research in context
Evidence before this study Early-pregnancy brain ultrasonography before 20 weeks is becoming routine clinical practice thanks to advances in ultrasound techniques, e.g. the introduction of high-frequency ultrasound probes and three-dimensional (3D) ultrasound, these advances enable early visualization of the human brain. However, monitoring growth and development and screening for abnormalities using ultrasound scans is time-consuming and prone to human errors. Automatic analysis will save time, reduce errors, and allow for multiple measurements to be taken at the same time. Hence, the aim of this systematic review was to gain insight into the future research directions needed to bring automated early-pregnancy brain ultrasonography analysis to clinical practice. In order to achieve this, we searched PubMed (Medline ALL Ovid), EMBASE, Web of Science Core Collection, Cochrane Central Register of Controlled Trials, and Google Scholar, from inception until June 2022. We included studies using ultrasound scans acquired before the 20th week of pregnancy. We included only full research papers written in the English language and no protocols, no review papers, conference abstracts, or case reports. There are several other systematic reviews focusing on computational methods for prenatal imaging. However, these reviews all focused on scans acquired during mid-and late-pregnancy and/or scans acquired with magnetic resonance imaging (MRI), and were not focused on the brain specifically.

Added value of this study
In this review, we created an overview of the future research directions needed to bring automated early-pregnancy brain ultrasonography analysis to clinical practice. The studies fitted in the following topics: biometry, standard plane detection, segmentation, growth models, visualization, abnormality detection and quality enhancement. The key reported attributes were: level of automation, learning-based or not, the usage of clinical routine data depicting normal and abnormal development, public sharing of program source code and data, and analysis of confounding factors. We found that the majority of the studies described the development of an automatic, learning-based method (62%). The most studied topic was biometry (40%), followed by standard plane detection (29%), segmentation (16%), growth models (7%), visualization (4%), abnormality detection (2%), and quality enhancement (2%). The majority of the studies did not use data from routine clinical care (55%). We found that none of the studies made their program source code publicly available and only two studies made the ultrasound data used publicly available. Finally, 35% of the studies did not analyse the influence of confounding factors and only 7% performed additional analyses for confounding factors beyond gestational age, image quality and body mass index.

Implications of all the available evidence
The findings of this systematic review show that there is an interest in automatic analysis of early-pregnancy brain ultrasonography. To bring this analysis to clinical practice we recommend that studies: use routine clinical data depicting both normal and abnormal, make their dataset and program source code publicly available, and be attentive to the influence of confounding factors. Automatic methods have the potential to drastically reduce the time needed in clinical practice for measurements of the brain and for the detection of structural abnormalities. Furthermore, automatic analysis enables the development of large-scale data-driven models. These models have the potential to provide insights into the factors influencing growth and development, which in turn may lead to early diagnosis, treatment, and prevention of neuro-development disorders.
(AI) has already been shown to enable automatic analysis of images in several medical applications and can be applied to its full potential to first trimester ultrasound, as the whole embryo, thanks to its limited size, can be imaged in one dataset. 13,14 Hence, we argue that automated analysis of ultrasonography offers an opportunity to bring early brain ultrasonography to clinical practice. The systematic review by Liu et al. showed that there is interest in developing AI for medical ultrasound analysis in different domains, but this interest is hampered by the low imaging quality of ultrasound due to noise and artifacts, and the limited amount of publicly available medical ultrasound data. 15 Looking at related work, several systematic reviews on computational methods for prenatal imaging have been performed. Most closely related is the work by Fiorentino et al., who reviewed deep learning methods for fetal ultrasound of all gestational ages and all organs. 16 Others reviews were focused on mid-and latepregnancy, included only fully automatic methods, or were based on MR images. [17][18][19][20] However, as in clinical practice MRI is not the standard modality and mainly manual or semi-automatic methods are used to analyse the acquired images; we found these reviews too limited.
Given the potential impact of automated earlypregnancy ultrasound analysis and the lack of a systematic review covering all methods for this crucial period, we performed a systematic review covering all types of computational methods for the analysis of earlypregnancy brain ultrasonography. By creating this overview, we aim to gain insight into future research directions needed to bring automated early-pregnancy brain ultrasonography analysis to clinical practice.

Search strategy and selection criteria
This systematic review adheres to the PRISMA guidelines and was registered a priori at the PROSPERO registry (CRD42020189888). 21 The specific search strategy was created together with a Health Sciences Librarian with expertise in systematic review searching. Literature search strategies were developed using medical subject headings (MeSH). We searched PubMed (Medline ALL Ovid), EMBASE, Web of Science Core Collection, Cochrane Central Register of Controlled Trials, and Google Scholar. We searched the databases from inception until June 2022. To ensure literature saturation, we scanned the reference lists of included studies, relevant reviews identified through the search and full paper proceedings of relevant international scientific conferences. Search terms used and the list of screened conference proceedings are given in Supplementary Material 1.
Literature search results were uploaded in Endnote. Two authors (WB, MR) independently screened the titles and abstracts obtained by the search against the inclusion criteria, any disagreement was resolved through discussion. Full papers were obtained for all titles that appeared to meet the inclusion criteria or when there was any uncertainty. One author (WB) screened the full papers and decided whether these met the inclusion criteria, in case of doubt the papers was discussed by WB and MR. Neither of the review authors was blinded to the journal titles or to the study authors or institutions.
Studies were selected according to the criteria outlined below. We included computational methods developed for human prenatal ultrasonography of the brain. Initially, we performed a broad search not restricted to brain ultrasonography. After title and abstract screening we obtained over 300 inclusions, which was too broad for a full text screening. Therefore, we decided to restrict ourselves to studies involving the brain only. Studies were excluded when the gestational age (GA) window of the study did not start before the 20th week of pregnancy and when the target structure of the study was not the brain or a brain structure. No restrictions on the type of data acquisition, study design, and number of subjects included in the study were applied. We included only full research papers written in the English language and no protocols, no review papers, conference abstracts, or case reports.

Data analysis
The extracted data consisted of the following: • year of publication; • title; • brain structures studied • level of automation: manual, semi-automatic, or automatic. Here, manual refers to methods where computation can only be done after a manual action of the operator, semi-automatic refers to methods where in interaction with the operator computations are performed. For automatic methods no actions of the operator are needed. • type of method: non-learning based, machine learning, or deep learning; • for learning based methods the learning strategy consisting of: whether cross-validation, an external test set, and/or data augmentation was used and who provided the annotations used for learning. Additionally, for non-learning based method we report here whether an external dataset was used for evaluation. • type of US used: 2D slices or 3D volumes; • ultrasonography machine and probe frequency; We divided all studies in the following topics: abnormality detection, biometry, growth models, segmentation, standard plane detection, quality enhancement and visualization. In abnormality detection studies the aim is to distinguish ultrasound images depicting abnormal development from images depicting normal development. Biometry studies focus on performing biometric and volumetric measurements of relevant structures within the embryonic and fetal brain. Growth model studies focus on models that describe the relationship between growth and development of the entire brain or of specific brain structures and GA. Segmentations studies focus on delineating the brain or brain structures in the ultrasound images. Standard plane detection studies focus on the detection of standard planes within the brain. Quality enhancement studies focus on improving the image quality. Finally, visualization studies focus on computational methods that visualize the embryonic or fetal brain. When studies performed tasks from multiple topics, they were classified in the category of the final topic. For example, biometric measurements are performed in standard planes, so studies that focus on biometry subsequently to standard plane detection are classified as biometry studies.
To assess the risk of bias of the studies included in this review, the ErasmusAGE quality score was used: a tool composed of five items based on previously published scoring systems that can be adapted to fit the topic of the review. 22 Each of the five items can be allocated either zero, one or two points. The final score is the sum of the points given for each item, resulting in a total score between zero and ten. The five items are: Q1 Study design: cross-sectional (0), longitudinal (1), intervention studies (2); Q2 Number of subjects used for validation, the study size: ≤35 (0), 35 to 250 (1), ≥250 (2); Q3 Description of the computational method: not reproducible based on description (0), key results are reproducible based on description (1), all results are reproducible based on description (2); Q4 Reporting of the outcome: inadequate (0), qualitative and/or quantitative outcome reported (1), additionally: multiple raters and/or comparison to known clinical outcome (2); Q5 Influence of confounding factors: not investigated (0), findings are analysed or adjusted for at least one of the key confounders (the influence of GA, acquisition quality and body mass index) (1), additional analysis or adjustment for confounding factors was performed (2).
Intervention studies are not applicable in this review; therefore the highest possible score is 9. The boundaries of the scoring for study size were determined by calculating the first quartile (Q1), median and third quartile (Q3) over the included full-text papers. We have chosen to evaluate only the number of subjects used for validation, rather than the total number available, since learning-based methods typically need a lot of data for development. However, the quality of the studies is generally determined by how much data is used for validation of the method, regardless of the method used. The complete quality scoring system used can be found in Supplementary Material 2.

Statistics
No statistical tests were used. The study size in Q2 (see above) was determined as the number of subjects used for validation of the method.

Role of funding source
The funder of the study had no role in this systematic review.

Included studies
The flowchart in Fig. 1 summarizes the literature search and selection of studies. Initially, 2545 potentially eligible studies were identified through the database search, and an additional 30 potential eligible studies were identified through other sources. After title and abstract screening 105 studies remained, and the fulltext was screened subsequently. We recorded the reasons for exclusion after full text screening in Fig. 1 and in Table S1 in Supplementary Material 3. After full-text screening we included 55 studies in the systematic review.

Study characteristics
In Table 1 the main characteristics of the included studies are given. We included 22 studies on biometry, 16 studies on standard plane detection, 9 studies on segmentation, 4 studies on growth models, 1 study on abnormality detection, 1 study on quality enhancement and 2 studies on visualization. Of the included studies 4 (6%) described a manual, 9 (16%) semi-automatic, and 42 (76%) an automatic method. Regarding the type of method, 14 (25%) studies used a non-learning based method, 34 (62%) studies used a learning-based method, of which 10 (18% of the studies) used machine learning and 24 (44% of the studies) deep learning, and for 7 (13%) studies the type of method could not be identified based on the text. These seven studies used all proprietary software, of the other 10 studies using proprietary software, one was learning based. Regarding the type of data used, we found in Table 1 that for 8 (15%) it was unclear what kind of data Articles was used, 22 (40%) studies used data acquired for research purposes, and 25 (45%) studies used clinical routine data depicting normal development. Finally, 7 (13%) studies additionally used clinical routine data depicting abnormal development and 1 study used data acquired for research purposes depicting abnormal development. None of the studies shared the program source code publicly, and two studies shared the data (23,24). 16 (29%) studies reported the computational time, and 7 (13%) studies reported what kind of computational hardware was used.  Computation time ranged from 70 micro seconds to 25 min. 32,35 Next, we discuss the studies per topic in the following order: biometry, standard plane detection, segmentation, and together: growth models, abnormality detection, quality enhancement and visualization.

Articles Standard plane detection
For standard plane detection the detailed information is given in Table 3. All 16 studies used automatic methods. In 11 out of the 16 studies the trans-cerebellar (TV), trans-thalamic (TT) and/or trans-ventricular (TV) plane was detected. Fig. 3 gives an overview of the detection accuracy and the GA range. 23 23 The other studies detected the brain, 35,[65][66][67] or other standard planes. 28,37 Only the study by Bastiaansen et al. was applied to first trimester data (<14 weeks GA). 65 Segmentation Table 4 gives the detailed information about the segmentation studies. The fetal head in the TV plane was segmented by five studies. 38,40,[68][69][70] Three of those used the aforementioned HC18 challenge dataset. 40,44,68,69 Furthermore, four studies segmented the cerebellum, 39,41,71,72 and three studies segmented the choroid plexus. 38,39,72 All these studies used a learningbased method, and four studies used data acquired in the first trimester (<14 weeks). 38,40,68,69 Fig. 4 gives an overview of the Dice score and GA range for all studies.
Growth models, abnormality detection, quality enhancement and visualization

2D versus 3D ultrasonography
We found 23 (43%) studies using 2D ultrasonography and 31 (57%) studies using 3D ultrasonography. The  A CNN was trained to register an image to an atlas. The atlas consisted of an ultrasound image put in a pre-defined orientation and the brain was segmented. By learning the correspondence between image and atlas the fetal brain was detected.   studies using 2D data were mainly using automatic methods (96%), and made use of deep learning (65%). For the studies using 3D data there was no main type of method used: there were 19 (61%) automatic, 9 (29%) semi-automatic and 3 (10%) manual methods, of which 8 (26%) used deep learning, 6 (19%) machine learning, and 12 (39%) were not learning-based. Only 9 (29%) of the studies using 3D ultrasonography mentioned that the ultrasound was acquired trans-vaginally, as recommended by the ISUOG. For the most studied topics, biometry, standard plane detection and segmentation, the included studies used both 2D and 3D ultrasonography, for biometry in 45% of the studies 2D ultrasonography was used and in 55% of the studies 3D ultrasonography was used, for standard plane detection this was 33% versus 66%, and for segmentation 55% versus 45%.

Risk of bias
For all included studies we found quality scores between 2 and 8, with a median of 5. The total quality score for each study can be found in Table 1, and the scores given per item can be found in Table S2 in Supplementary Material 5. There were only four (7%) studies using longitudinal data 29,48,49,74 and only two of the studies gave an unreproducible description of their method. 67,76 Furthermore, 27 (49%) studies had, besides qualitative and/or quantitative reporting of outcome, additionally multiple raters or compared their result to known clinical outcomes. Regarding analysing the influence of confounders, 19 (35%) studies did not adjust or analyse the influence of at least one of the key confounders (GA, acquisition quality or body mass index) and only four (7%) studies performed an analysis to identify or adjust for additional confounders such as, challenging fetal position, abdominal scarring and uterine fibroid, 67 fetal position, maternal body habitus and prior uterine surgery, 27 maternal age, pregnancy duration, birthweight, number of ultrasound examinations, 49 maternal age and fetal position. 62 Fig. 3: Overview of detection accuracy of the trans-cerebellar plane (TC), trans-thalamic plane (TT) and trans-ventricular plane (TV) per week gestational age. The thickness of the bar indicates the number of subjects used for validation: thinnest: <35 subjects, middle: between 35 and 250 subjects, thickest: >250 subjects. A white bar with a black edge indicates that no accuracy was reported. The accuracy is shown up to the 20th week, since we were interested in the performance during early pregnancy.

Author
Year Type of method

Discussion
First trimester 3D ultrasonography screening of the brain has the potential for early detection of major abnormalities. 11 This is supported by the recent recommendation of the ISUOG to perform a 3D, or if not feasible a 2D, neuro-sonographic screening. However, this screening currently relies on manual measurements and visual inspection of the ultrasound scans, which is time-consuming, prone to human errors, and requires additional imaging and interpretation expertise. However, this expertise is not always present in clinical practice. 10 Computational methods can these analyses, hence the aim of this systematic review was to gain insight into the future research directions needed to bring automated early-pregnancy ultrasound analysis into clinical practice. In this review the most studied topic was biometry (40%), followed by standard plane detection (29%), segmentation (16%), growth models (7%), visualization (4%), abnormality detection (2%), and quality enhancement (2%). We observed a focus on fully automated learning-based methods, as 76% of the studies used an automatic method and 62% used a learning-based method. However, of the 17 studies using proprietary software available in clinical practice only one is learning-based. 26 Hence, automated learning-based methods are being developed, but are not yet widely integrated in software used in clinical practice. A possible explanation for this, is that early brain ultrasonography is not yet standard practice worldwide. This is due to the fact that early brain ultrasonography requires a high level of expertise, which is not available in all clinical settings. 10 The fact that early brain ultrasonography is not yet widely part of clinical practice, is reflected in this review: most studies do not evaluate their method using data from clinical routine practice: the data source is either unclear (15%) or the used data was acquired for research purposes (40%). Moreover, only 13% of the studies used clinical routine data depicting abnormal development in the development of their method, and none of these studies were learning-based. However, abnormal development often leads to structural malformations of the brain, which may be wrongly handled by learning-based methods that are not trained and evaluated for these cases. Hence, our first recommendation is that there should be more focus on developing and evaluating automated learning-based method using clinical routine data depicting both normal and abnormal development.
Evaluation on clinical routine data shows the potential benefit a computational method can have in terms of accuracy and time needed, which can lead to integration by commercial parties into already widely used software.
We observed that only 31% of the included studies focused on the first trimester. An explanation might be that there is a limited amount of ultrasound data available for method development, both from clinical practice and research. This is due the fact that recommendation by the ISUOG to perform ultrasonography of the brain in this period is fairly recent and that only two studies shared their data publicly. The dataset by van den Heuvel et al. consists of data covering all three Year Type of method trimesters and can therefore be used to extend methods that were initially developed for the second and third trimester. However, this dataset consists only of 2D slices of the trans-ventricular plane, which are not usable for all studies presented in this review. The second publicly available dataset by Burgos-Artizzu et al. is not covering the first trimester, as it starts at gestational week 18. 23 Regarding the balance between 2D and 3 ultrasonography, we found 23 (43%) studies using 2D and 31 (57%) studies using 3D ultrasonography. Furthermore, we observed that for the studies using 2D ultrasonography the majority used an automatic (96%), learning-based (65%) method, which was not the case for 3D ultrasonography. This can partly be explained by the two aforementioned publicly available 2D datasets, 23,44 and partly by the fact that 3D ultrasonography is not yet widely integrated in clinical practice, 11 which may lead to fewer available annotations. Hence, the availably of a dataset containing also 3D ultrasound would be beneficial to push the development of automatic, learning-based methods.
Sharing data publicly is in some cases impossible due to privacy regulations; therefore, another good option is to share the program source code. Hence, our second recommendation is that studies should make their dataset and program source code publicly available, especially for 3D ultrasonography during the first trimester. Having the program source code available would lead to more easy comparison between methods, since every research institute can repeat the analysis on their own available data. This was done for none of the studies in the review, but is rapidly becoming the standard as shown in the systematic review of Shen et al., where over the last 10 years the number of open source GitHub repositories, providing code for medical applications, had an annual growth rate of 55%. 79 Another promising approach is federated learning, where learning-based models are trained locally and only the locally learned models are shared and aggregated. 80 An additional challenge for 3D ultrasonography is that prior to biometry, growth modelling, abnormality detection and visualization the required standard plane must be detected. Only three (5%) biometry studies in this review detected the appropriate standard plane, all other studies assumed its availability. 25,30,47 However, in clinical practice the appropriate standard planes are not available and have to be found manually by the sonographer. Hence for adoption in clinical practice, the integration of standard plane detection prior to other tasks is a topic that should be studied in more detail.
All studies adequately reported their outcomes qualitatively and/or quantitatively, and additionally 49% of the studies had multiple raters or compared their results to clinical known values. Regarding the  Articles evaluation of the learning-based methods, 76% used an external test set and 20% additionally performed crossvalidation. Furthermore, for 88% of the studies the annotations were made by, or under supervision of, one or multiple clinical experts. We found that 16 (29%) studies reported the computational time, which ranged from 70 micro seconds to 25 min. However, when a method is commercialized, optimization steps are taken to minimize the computation time. Furthermore, different computational resources and data-types were used (2D/3D), which also dramatically influences the computational time. Therefore, the computation time reported by each study can not be compared directly, and should be seen as an upper estimate for the possible computation time in clinical practice. For the adoption of such a method in clinical practice, the computation time should be at least be equally fast as manual analysis.
Finally, for the bias assessment of the included studies, we obtained a relatively low median score of 5 out of 9. This is due to the fact that the ErasmusAGE score was initially designed for epidemiological studies, and although it can be adapted, the score is biased towards this type of research. However, currently there is no quality score tailored for computational methods available. Therefore, we have chosen to adapt the ErasmusAGE score, since it is general, well validated and covers key points such as description of methodology and quality of evaluation. As a consequence, in our review, studies scored lower due to the fact that only 5 (9%) of the studies used longitudinal data and 19 (35%) studies did not adjust or analyse for any confounding factors. Although in the evaluation and development of computational method the usage of longitudinal data is not necessary, in some cases, such as growth models, it offers more insight. For the other topics such as biometry, segmentation and standard plane detection analysis of confounding factors is to heavily penalized here. Regarding the analysis of the influence by confounding factors: it is known that image quality of ultrasound varies widely since it is operator and vendor dependent 15 and is influenced by the BMI of the mother. Another challenging aspect for ultrasound images of the fetus, are the rapid development during early pregnancy and movements during acquisition. Hence, our third recommendation is that every computational method should be evaluated in terms of robustness to at least these key confounders.
We have chosen to focus only on studies involving the brain, since it is clearly visible during early ultrasonography, and its growth and structural development continues throughout pregnancy. However, when abnormal brain development occurs, this may affect the growth and development of the entire embryo and fetus, and thus may also become apparent when monitoring the growth and development of other organs, or the embryo and placenta as a whole. Similarly, other abnormalities not related to the brain, such as spinal and cardiac congenital defects, could affect the development of the brain. The influence of abnormalities in other organs should therefore be taken into account when monitoring growth and development of the brain.
We compared our findings to related systematic reviews, and observed that Fiorentino et al. reviewed deep learning methods for fetal ultrasound of all gestational ages. 16 They found, in line with our findings, that most studies are about biometry and standard plane detection, and were mainly applied to second and third trimester data. The most studied topic was the cardiac system, followed by the brain. Furthermore, they found three public datasets, two of which we found as well, 23,44 and one additional public dataset by Rueda et al. for head and femur segmentation in gestational weeks 21, 28 and 32. 81 Fiorentino et al. stressed the importance of automated analysis of first trimester data, as it can be used to determine gestational age, whereas biometry at later gestation can only be used to monitor growth progress.
Although less common, during pregnancy MR imaging can be used. We compared our findings to systematic reviews about prenatal and neonatal MR imaging of the brain, and firstly found that segmentation is a well-studied topic, with respectively 33 20 , 16 17 , and 14 18 automatic methods. Secondly, Oishi et al. found 16 atlases, starting at the 20th week of pregnancy, describing normal growth of the fetal and neonatal brain which are publicly available. 19 Finally, for the neonatal and infant brain Li et al. found 5 datasets and 7 image processing tools which are publicly available. 17 Hence, we conclude that prenatal and neonatal MR imaging of the brain is an active field of research and sets a good example for early brain ultrasonography in terms of making both datasets and program source code publicly available. However, similar as for early ultrasonography, prenatal MR imaging of the brain is mainly focused at the second half of pregnancy.
In summary, we recommend the following to improve the adoption of automated learning-based methods in routine clinical practice for early brain ultrasonography: 1. We recommend that in the evaluation of computational methods routine clinical data depicting both normal and abnormal development is used: this will result in a direct reflection of the effect these methods can have in clinical practice. 2. We recommend that studies should make their dataset and program source code publicly available, especially for 3D ultrasonography during the first trimester. Sharing code and/or datasets allows researchers of other institutes to evaluate and extend already existing methods, for example by integration of different tasks such as standard plane detection and biometry. 3. We recommend that studies pay more attention to the influence of the key confounding factors, GA, image quality and body mass index, on the accuracy of their computational methods.
Bringing automatic methods to routine clinical practice will not only drastically reduce the time needed for measurements of the brain and for detection of structural abnormalities, but it will also enable largescale data-driven model development. These models may provide more detailed insight into the factors, such as lifestyle and epigenetics, that influence growth and development of the fetus. On the one hand, this insight could lead to earlier and better diagnosis of neurodevelopmental disorders, which positively influences treatment. On the other hand, this insight could also contribute to prevention of neuro-developmental disorders, for example by introducing periconceptional lifestyle coaching focusing on the factors that influence growth and development. [82][83][84][85][86][87] Hence, introducing automatic methods to routine clinical practice, especially targeted at early pregnancy, may ultimately lead to better neuro-development of the fetus.

Contributors
All authors were responsible for the concept and design. MR, SK, RST and WN supervised the project. MR and WB did the study selection and data extraction. AK, MR, SK and WB contributed to data analysis and interpretation. WB wrote the original draft of the manuscript with input from MR. All authors contributed to critical revision of the manuscript, and all authors approved the manuscript. All authors had access to the data presented in this paper and the supplementary material prior to submission.

Data sharing statement
The search strategy and list of excluded papers during full-text screening is available in the supplementary material; any additional data are available on request.
Declaration of interests WN is founder, scientific lead and was stock holder of Quantib BV. Wiro Niessen is board member of the Technical Branch of the Dutch Science Foundation (NWO-TTW). All other authors declare no conflicts of interest.